docs: repro: add --pull #1841

efiop · 2020-10-05T23:27:36Z

You may disregard these recommendations if you used the Edit on GitHub button from dvc.org to improve a doc in place.

❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.

🐛 Please make sure to mention Fix #issue (if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.

Please choose to allow us to edit your branch when creating the PR.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Per iterative/dvc#4538

content/docs/command-reference/repro.md

shcheklein · 2020-10-06T00:26:47Z

content/docs/command-reference/repro.md

@@ -154,6 +155,9 @@ up-to-date and only execute the final stage.
  corresponding pipelines, including the target stages themselves. This option
  has no effect if `targets` are not provided.

+- `--pull` - try automatically pulling cached outputs if they are not present in


okay, a few questions:

try - what happens if it fails?

pulling - make it a link to the dvc pull probably

cached outputs - here not sure if it's better to use DVC-tracked outputs. (otherwise when you read it is bit hard mentally since you are they cached but not present in cache).

WDYT?

@jorgeorpinel ?

as it was before, it will simply not restore from run-cache.

Addressed 2 and 3. Thank you.

Yep, got it. I guess it's fine for now. No reason to further improve this since we don't have run-cache documented anywhere. So we can keep as is -an advanced option.

we don't have run-cache documented anywhere

BTW the run-cache is already mentioned in 6 cmd refs (published) and in the Data Pipelines page of the GS, which I just noticed/realized just now. I thought we were not going to include any info about experiments until it's more stable? Should we remove these mentions or prioritize documenting run-cache? Thanks

@jorgeorpinel Let's not remove it. It is in a semi-official state, people already use it due to cml and other sources. We are on our way to cleaning up the ui overall and publishing experiments.

Run-cache doc by itself doesn't really mean anything to the users, which is why I didn't write it in the summer. It only makes sense in particular commands, so the doc about run-cache internals could wait for the high-level commands.

Let's not remove remove it. It is in a semi-official state

OK, I agree it's best too keep, but it could be problematic that the run-cache mentions are completely out of context (no explanation of the concept anywhere).

Run-cache doc by itself doesn't really mean anything to the users... the doc about run-cache internals could wait...

Much disagree 🐶 I mean it's not so important whether it's a stand-alone doc or a new section in existing doc(s), but the basics about run-cache seem like a quite important thing to document to me.

It only makes sense in particular commands

Yeah anywhere we want to put it as long as it's published would be great since this is already semi-official.

@jorgeorpinel Agreed, I've added the run-cache doc ticket to next sprint, just preliminarily. Thanks 🙂

Many thank! 🦊

content/docs/command-reference/repro.md

jorgeorpinel · 2020-10-14T17:42:09Z

content/docs/command-reference/repro.md

+- `--pull` - try automatically [pulling](/doc/command-reference/pull) missing
+  cache for outputs restored from run-cache.


Back on this. Per iterative/dvc#4538 (comment):

dvc repro --pull pulls regular files, hashes for which might've been restored from the existing run-cache, so kinda like regular dvc pull

Unfortunately I don't understand either one of the explanations. What's the relationship between run-cache and repro --pull? Maybe a step-by-step explanation like 1. Use repro --pull; 2. run-cache is checked before executing commands (default repro behavior I think); 3. Some output hashes are found? (but not the actual files? This is the confusing part); 4. Hashes are looked for in the cache but not found; 5. The files are looked for in remote storage. Something like that

Please @efiop ! Thanks in advance

@jorgeorpinel Even if we leave the run-cache out, repro --pull would still try to dvc pull outputs that are missing, but the pipeline didn't change. E.g. when you forgot to dvc pull beforehand and you are trying to dvc repro otherwise up-to-date pipeline, so dvc repro --pull will just pull the outputs for such stages instead of trying to reproduce them.

Run-cache is then just a special source of lock files, and repro --pull works the same way as explained above.

Want to point out again that --pull is still a temporary solution that was needed to improve pull --run-cache that is also not complete in a product sense. So I would recommend not spending much time on this, as the product scenario is WIP and there is no reason to optimize the docs for it too much.

OK it makes more sense now, thanks.

In this case I do feel like need to spend enough time understanding what's going on so that when the coming bulk of docs related to new features hit, I'm better prepared. So thanks again for baring with me!

Last Q @efiop. Does this only check the default remote (if one is set)? Or all remotes?

Actually, 2 more questions...

Does it check only the local run-cache? Or also the remote run-cache for possible dep/out hashes?

What happens if you do repro --pull --no-run-cache? Is the run-cache check skipped?

Thanks!

Does this only check the default remote (if one is set)? Or all remotes?

Only the default remote right now.

Does it check only the local run-cache? Or also the remote run-cache for possible dep/out hashes?

Yes, only local run-cache.

What happens if you do repro --pull --no-run-cache? Is the run-cache check skipped?

Correct. It will only pull if you have your lock file complete (so hashes are already there, just the outputs are missing from cache), but won't try to use run-cache.

Please feel free to ask any questions, I do understand that this incomplete feature is a bit confusing.

That's all I can think of for now. Thanks @efiop! Updated in https://github.com/iterative/dvc.org/pull/1881/files#diff-6c1f3192f09e2722ba169e9fa219b3b5158bbafa470b382c2d6135db7aa1e20d.

per #1841 (comment)

docs: repro: add --pull

e7f8f51

Per iterative/dvc#4538

shcheklein deployed to dvc-landing-repro-pull-uuofdyr October 5, 2020 23:27 View deployment

efiop mentioned this pull request Oct 5, 2020

repro: support --pull iterative/dvc#4538

Merged

2 tasks

shcheklein reviewed Oct 5, 2020

View reviewed changes

content/docs/command-reference/repro.md Outdated Show resolved Hide resolved

shcheklein reviewed Oct 6, 2020

View reviewed changes

update message

f1eb8c4

shcheklein temporarily deployed to dvc-landing-repro-pull-uuofdyr October 6, 2020 00:46 Inactive

shcheklein approved these changes Oct 6, 2020

View reviewed changes

shcheklein merged commit 15aa177 into master Oct 6, 2020

efiop deleted the repro_pull branch October 6, 2020 01:05

jorgeorpinel reviewed Oct 9, 2020

View reviewed changes

content/docs/command-reference/repro.md Show resolved Hide resolved

jorgeorpinel reviewed Oct 14, 2020

View reviewed changes

jorgeorpinel mentioned this pull request Oct 20, 2020

Misc. updates #1881

Merged

jorgeorpinel added a commit that referenced this pull request Oct 21, 2020

cmd: further clarify about repro --pull

76b8189

per #1841 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: repro: add --pull #1841

docs: repro: add --pull #1841

efiop commented Oct 5, 2020

shcheklein Oct 6, 2020

efiop Oct 6, 2020

shcheklein Oct 6, 2020

jorgeorpinel Oct 9, 2020

efiop Oct 9, 2020 •

edited

Loading

jorgeorpinel Oct 11, 2020 •

edited

Loading

efiop Oct 12, 2020

jorgeorpinel Oct 12, 2020

jorgeorpinel Oct 14, 2020 •

edited

Loading

efiop Oct 18, 2020

jorgeorpinel Oct 20, 2020

jorgeorpinel Oct 20, 2020

jorgeorpinel Oct 20, 2020

efiop Oct 20, 2020

jorgeorpinel Oct 21, 2020

		- `--pull` - try automatically [pulling](/doc/command-reference/pull) missing
		cache for outputs restored from run-cache.

docs: repro: add --pull #1841

docs: repro: add --pull #1841

Conversation

efiop commented Oct 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

efiop Oct 9, 2020 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Oct 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

efiop Oct 9, 2020 •

edited

Loading

jorgeorpinel Oct 11, 2020 •

edited

Loading

jorgeorpinel Oct 14, 2020 •

edited

Loading